- Introductions
- Class overview
- Getting R up and running
[Photo by Belinda Fewings on Unsplash]
[Photo by Belinda Fewings on Unsplash]
Poll: How are you feeling right now?
Carrie Wright (she/her)
Senior Staff Scientist, Fred Hutchinson Cancer Center
Associate, Department of Biostatistics, JHSPH
PhD in Biomedical Sciences
Email: cwrigh60@jhu.edu Web: https://carriewright11.github.io
Ava Hoffman (she/her)
Senior Staff Scientist, Fred Hutchinson Cancer Center
Associate, Department of Biostatistics, JHSPH
PhD in Ecology
Email: ava.hoffman@jhu.edu Web: https://avahoffman.com
Clif McKee (he/him)
Research Associate, Department of Epidemiology, JHSPH
Masters and PhD in Ecology
Email: cmckee7@jhu.edu Web: http://clifmckee.github.io
Rupshikha Sen
MSc Graduate Student, Department of Epidemiology, BSPH
DDS, The Dayananda Sagar Institutions, Bangalore
Email: rsen6@jhmi.edu
Please introduce yourself!
Find the “introductions” channel on Slack: https://intro-to-r-june2023.slack.com/
Learning a programming language can be very intense and sometimes overwhelming.
We recommend fully diving in and minimizing other commitments to get the most out of this course.
We want you to succeed – We will get through this together!
R is a language and environment for statistical computing and graphics developed in 1991
R is the open source implementation of the S language, which was developed by Bell laboratories in the 70s.
The aim of the S language, as expressed by John Chambers, is “to turn ideas into software, quickly and faithfully”
![]()
[source: http://www.r-project.org/, https://en.wikipedia.org/wiki/S_(programming_language), https://en.wikipedia.org/wiki/Bell_Labs)]
Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand developed R
R is both open source and open development

[source: http://www.r-project.org/, https://en.wikipedia.org/wiki/R_(programming_language)]
Free (open source)
High level language designed for statistical computing
Powerful and flexible - especially for data wrangling and visualization
Extensive add-on software (packages)
Strong community
[source: https://rladies-baltimore.github.io/]
Little centralized support, relies on online community and package developers
Annoying to update
Slower, and more memory intensive, than the more traditional programming languages (C, Perl, Python)
[source -School vector created by nizovatina - www.freepik.com]
What do you hope to get out of the class?
Why do you want to use R?
[Photo by Nick Fewings on Unsplash]
http://jhudatascience.org/intro_to_r
Materials will be uploaded the night before class. We are constantly trying to improve content! Please refresh/download materials before class.
https://courseplus.jhu.edu/core/index.cfm/go/syl:syl.public.view/coid/20749/
End of class Survey from JHU - link in email.
[source - Banner vector created by pch.vector - www.freepik.com]
Homework and Final Project due by January 24th at 11:59pm ET.
If you turn homework in earlier this can allow us to potentially give you feedback earlier.
Note: Only people taking the course for credit must turn in the assignments. However, we will evaluate all submitted assignments in case others would like feedback on their work.
If you can, we suggest working virtually with a large monitor or two screens. This setup allows you to follow along on Zoom while also doing the hands-on coding.
Install the latest R version (4.3.2 (called ‘Eye Holes’) as of October 31, 2023)
More detailed instructions on the website.
RStudio is an integrated development environment (IDE) that makes it easier to work with R.
More on that soon!
This course will involve moving files around on your computer and downloading files.
If you are new to this - check out these videos.
If you have a PC: https://youtu.be/we6vwB7DsNU
If you have a Mac: https://www.youtube.com/watch?v=Ao9e0cDzMrE
You can find these on the resource page of the website.
R jargon: https://link.springer.com/content/pdf/bbm%3A978-1-4419-1318-0%2F1.pdf
Package - a package in R is a bundle or “package” of code (and or possibly data) that can be loaded together for easy repeated use or for sharing with others.
Packages are analogous to a software application like Microsoft Word on your computer. Your operating system allows you to use it, just like having R installed (and other required packages) allows you to use packages.
Function - a function is a piece of code that allows you to do something in R. You can write your own, use functions that come directly from installing R, or use functions from additional packages.
You can think of a function as verb in R.
A function might help you add numbers together, create a plot, or organize your data. More on that soon!
sum(1, 20234)
[1] 20235
Argument - what you pass to a function
sum(1, 20234)
[1] 20235
digitsround(0.627, digits = 2)
[1] 0.63
round(0.627, digits = 1)
[1] 0.6
Object - an object is something that can be worked with or on in R - can be lots of different things! You can think of objects as nouns in R.
… many more
examples: temperature, length, count, color, category
examples: people, houses, viruses etc.
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
[source]
Sample = Row
Variable = Column
Data objects that looks like this is often called a data frame.
Fancier versions from the tidyverse are called tibbles (more on that soon!).
We will mostly show you how to use tidyverse packages and functions.
This is a newer set of packages designed for data science that can make your code more intuitive as compared to the original older Base R.
Tidyverse advantages:
- consistent structure - making it easier to learn how to use different packages
- particularly good for wrangling (manipulating, cleaning, joining) data
- more flexible for visualizing data
Packages for the tidyverse are managed by a team of respected data scientists at Posit.

See this article for more info.
We will go through this in the lab.
Differs depending on the source (CRAN, GitHub, etc)
Must be done once for each installation of R.
You can install packages from CRAN using the tool menu in RStudio:
tools > Install Packages
Type in the package name to install.
We use a function called install.packages() for CRAN packages.
Here is an example where we “install” the dplyr package:
install.packages("dplyr")
The package name is enclosed in quotation marks.
After installing packages, you will need to “load” them into memory so that you can use them.
This must be done every time you start R.
We use a function called library to load packages.
Here is an example where we “load” the dplyr package:
library(dplyr)
Quotation marks are optional.
Found on our website under the Resources tab: https://jhudatascience.org/intro_to_r/resources.html
Want more?
Tidyverse Skills for Data Science Book: https://jhudatascience.org/tidyversecourse/ (more about the tidyverse, some modeling, and machine learning)
Tidyverse Skills for Data Science Course: https://www.coursera.org/specializations/tidyverse-data-science-r
(same content with quizzes, can get certificate with $)
R for Data Science: http://r4ds.had.co.nz/
(great general information)
R basics by Rafael A. Irizarry: https://rafalab.github.io/dsbook/r-basics.html (great general information)
Open Case Studies: https://www.opencasestudies.org/
(resource for specific public health cases with statistical implementation and interpretation)
Dataquest: https://www.dataquest.io/
(general interactive resource)
Need help?
Various “Cheat Sheets”: https://www.rstudio.com/resources/cheatsheets/
R reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf
R jargon: https://link.springer.com/content/pdf/bbm%3A978-1-4419-1318-0%2F1.pdf
R vs Stata: https://link.springer.com/content/pdf/bbm%3A978-1-4419-1318-0%2F1.pdf
R terminology: https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf
Interested in Reproducibility?
Check out Candace’s courses:
tidyverse can help make R more intuitive.Image by Gerd Altmann from Pixabay